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Abstract 

We discuss the application of multilevel Monte Carlo methods to elliptic partial differential 
equations with random coefficients. Such problems arise, for example, in uncertainty quantifi- 
cation in subsurface flow modeling. We give a brief review of recent advances in the numerical 
analysis of the multilevel algorithm under minimal assumptions on the random coefficient, and 
extend the analysis to cover also tensor-valued coefficients, as well as point evaluations. Our 
analysis includes as an example log-normal random coefficients, which are frequently used in 
applications. 

1 Introduction 

There are many situations in which modeling and computer simulation are indispensable tools and 
where the mathematical models employed have been demonstrated to give adequate representations 
of reality. However, the parameters appearing in the models often have to be estimated from 
measurements and are, therefore, subject to uncertainty. This uncertainty propagates through the 
simulations and quantifying its impact on the results is frequently of great importance. 

A good example is provided by the problem of assessing the safety of a potential deep geolog- 
ical repository for radioactive wastes. Any radionuclides leaking from such a repository could be 
transported back to the human environment by groundwater flowing through the rocks beneath 
the earth's surface. The very long timescales involved mean that modeling and simulation are 
essential in evaluating repository performance. The study of groundwater flow is well established, 
and there is general scientific consensus that in many situations Darcy's Law can be expected to 
lead to an accurate description of the flow [8]. The main parameter appearing in Darcy's Law is 
the permeability, which characterizes how easily water can flow through the rock under a given 
pressure gradient. In practice it is only possible to measure the permeability at a limited number 
of spatial locations, but it is required at all points of the computational domain for the simulation. 
This fact is the primary source of uncertainty in groundwater flow calculations. Understanding and 
quantifying the impact of this uncertainty on predictions of radionuclide transport is essential for 
reliable repository safety assessments. 

A widely used approach for dealing with uncertainty in groundwater flow is to represent the 
permeability as a random field [10\ |9]. A model frequently used is a log-normal random field, 
with a covariance function that is only Lipschitz continuous. Individual realizations of such fields 
have low spatial regularity and significant spatial variation, making the problem of solving for 
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the pressure very costly. The notoriously slow rate of convergence of the standard Monte Carlo 
algorithm means that many such realizations are required to obtain accurate results, rendering the 
problem computationally unfeasible. 

In this paper, we therefore employ the multilevel Monte Carlo (MLMC) method. This method 
was first introduced by [12] in the context of stochastic differential equations in finance, and similar 
ideas were also used by [17] and [2]. In the context of our groundwater flow model problem, it was 
shown in for example [7] and [20j , that the multilevel method leads to a significant reduction in the 
computational cost required to achieve a given accuracy. 

The main challenge in the numerical analysis of MLMC methods for elliptic partial differential 
equations (PDEs) with random coefficients, is the quantification of the numerical discretization 
error, or in other words the bias of the estimator. Models for the random coefficient frequently used 
in applications, such as log-normal random fields, are not uniformly coercive, making the numerical 
analysis challenging. A rigorous analysis of the MLMC algorithm under minimal assumptions on 
the random coefficient was recently carried out by [5] and [20]. In particular, uniform coercivity 
or boundedness were not assumed in these papers. If one does assume uniform coercivity and 
boundedness of the coefficient, the analysis of the discretization error is classical, and an analysis 
of the MLMC method for this case can be found in [lj. Other related works on numerical errors 
for elliptic PDEs with random coefficients are [1] and |13j . 

The aim of this paper is to extend the theory in |20j . We here consider the case of more general, 
tensor-valued models of the permeability, which are often used in applications to model orthotropic 
media. We will also prove convergence of the MLMC algorithm for point evaluations of the pressure 
or the Darcy flux. 

The outline of the paper is as follows. In fj2[ we describe the multilevel Monte Carlo algorithm 
applied to elliptic PDEs with random coefficients, and discuss its performance. In £j3j we then 
prove an upper bound on the computational cost of the multilevel Monte Carlo estimator. We 
recall some of the main results from ( [U [20] ) , before extending the results to tensor-coefficients and 
point evaluations. 

2 Multilevel Monte Carlo Simulation 

The classical equations governing a steady state, single phase flow, are Darcy's law coupled with 
an incompressibility condition. These equations can be written in second order form as 

- div (AVm) = /, in D C R d , (2.1) 

subject to appropriate boundary conditions. Here, A is the permeability tensor, u is the resulting 
pressure field, and / are the source terms. Modeling A as a random field, u also becomes a random 
field. 

In applications, one is then usually interested in finding the expected value of some functional 
Q = M{u) of the solution it to our model problem (|2.ip . This could for example be the value of 
the pressure u or the Darcy flux -AVm at or around a given point in the computational domain, 
or outflow over parts of the boundary. Since u is not easily accessible, Q is often approximated by 
the quantity Qh '■= M(uh), where Uh is a finite dimensional approximation to u, such as the finite 
element solution on a sufficiently fine spatial grid Th- 

To estimate E [Q] , we then compute approximations (or estimators) Qh to E [Qh] , and quantify 
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the accuracy of our approximations via the root mean square error (RMSE) 

1 /2 

e(Q h ):=(E[(Q h -E(Q)) 2 ]) . 

The computational cost C e (Qh) of our estimator is then quantified by the number of floating point 
operations that are needed to achieve a RMSE of e(Qh) < e - This will be referred to as the e-cost. 
The classical Monte Carlo (MC) estimator for E [Qh] is 

^ : =^X>(- (i) )> 

i=l 

where Qft(wW) is the ith sample of Qh and N independent samples are computed in total. 

There are two sources of error in the estimator ([2]), the approximation of Q by Qh, which is 
related to the spatial discretisation, and the sampling error due to replacing the expected value 
by a finite sample average. This becomes clear when expanding the mean square error (MSE) 
and using the fact that for Monte Carlo E[Qg£] = E[Q h ] and V[Q$J] = iV _1 V[Q h ], where 
V[X] := E[(X — E[X]) 2 ] denotes the variance of the random variable X : $7 — s> R. We get 

e(QMC )2 = N -i Y [Q h ] + (E[Q h -Q]) 2 . (2.2) 

A sufficient condition to achieve a RMSE of e with this estimator is that both of these terms are 
less than e 2 /2. For the first term, this is achieved by choosing a large enough number of samples, 
N = 0(e~ 2 ). For the second term, we need to choose a fine enough finite element mesh Th, such 
that E[Q h -Q] = 0[e). 

The main idea of the MLMC estimator is very simple. We sample not just from one approxi- 
mation Qh of Q, but from several. Linearity of the expectation operator implies that 

L 

E[Q h ] = E[Q h0 ] + J>[Q^ ~ Q/*_J 
t=i 

where \he}t=o ... l are the mesh widths of a sequence of increasingly fine triangulations Th e with 
7ft := 7~h L , the finest mesh. Hence, the expectation on the finest mesh is equal to the expectation 
on the coarsest mesh, plus a sum of corrections adding the difference in expectation between 
simulations on consecutive meshes. The multilevel idea is now to independently estimate each of 
these terms such that the overall variance is minimized for a fixed computational cost. 

Setting for convenience Yq := Qh and Y^ := Qh e — Qh t ^ x i for 1 < t < L, we define the MLMC 
estimator simply as 

L 

nML ._ V^yMC 
^h,{N e } ■- l^ Y t, N e> 
1=0 

where Yffi t is the standard MC estimator for Ye, 

yMC _ 1 X^VJ,,{i)\ 

1 8=1 



3 



Here, it is important to note that Yi(u)®) = Qh e {w^) — Qh e _ 1 (u^), i.e. the quantity Y^(u;W) is 
computed using the same sample on both meshes. 

Since all the expectations E[Y^] are estimated independently in ([2]), the variance of the MLMC 
estimator is Yle=o V[i^] and expanding as in (|2.2ft leads again to a MSE of the form 

<QhM? ■= ^[(Qk!{N e} -nQ]) 2 ] = + (nQh-Q]) 2 . 

As in the classical MC case before, we see that the MSE consists of two terms, the variance of the 
estimator and the error in mean between Q and Q^. Note that the second term is identical to the 
second term for the classical MC method in (12. 2p . A sufficient condition to achieve a RMSE of e is 
again to make both terms less than e 2 /2. This is easier to achieve with the MLMC estimator, as 

• for sufficiently large ho, samples of Qh are much cheaper to obtain than samples of Q\{, 

• the variance Yi tends to as he — > 0, meaning we need fewer samples on The > f° r ^ > 0. 

Let now Ci denote the cost to obtain one sample of Qh e • Then we have the following results on 
the e-cost of the MLMC estimator (cf. [UCDJ). 

Theorem 2.1. Suppose there are positive constants a, (3, 7, c M1 , c M2 , c M3 > such that a > ^ min(/3, 7) 
and 

Ml. \E[Q h -Q]\ < c M1 h a , 
M2. V[Q he - Q hl _ x ] < c M2 hi, 
M3. C £ < c M3 hp, 

Then, for any e < e _1 , there exist an L and a sequence {Ni}f =Q , such that e(Q^ }) ^ e an< ^ 



e" 2 , if P>7, 

£- 2 (loge) 2 , if /3 = 7, 

e -2-( 7 -«/a j j/ /3 < 7, 



where the hidden constant depends on c M1 , c M2 and c M3 . For the classical MC estimator we have 
C £ (Q^ C ) < e _2_7 / a , where the hidden constant depends on c M1 and c M3 . 

The convergence rates a and /3 in Theorerr i2.ll are related to the convergence of the spatial 
discretization error, and have been proven for various quantities of interest in [5] and |20] , In £|3.2| 
we further extend this theory to point evaluations of the pressure and the flux. Typical values of 
a and f3 for the model problem considered in this paper are a = 1 and (3 = 2 for rough models 
of the permeability and a = 2 and j3 = 4 for smoother models (this is made more precise in S|3]). 
The rate 7 is related to the cost of numerically solving the PDE for one realization of the random 
coefficient. This involves producing a sample of the random coefficient, and solving a linear system 
of equations. The cost of solving the linear system will generally be dominant, and with an optimal 
linear solver, the cost of one such solve is proportional to h7 d , the number of unknowns, and so 
7 ~ d. 
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In Table [H we show the e-costs as predicted by Theorem 12. 11 for typical values of a and f3. We 
assume an almost optimal linear solver, and take 7 to be slightly larger than d. We see that the 
gains we can expect from using the MLMC estimators are always significant, usually in the order 
of two orders of magnitude. It is also worth noting that although the actual e-costs are higher in 
the case of the rough model problem with a = 1 and /3 = 2, the gains we can expect from MLMC 
are also greater in this case. 

Table 1: Upper bounds for the e-costs of classical and multilevel Monte Carlo from Theorem l2.1l in 
the cases a = 1, (3 = 2 (left) and a = 2, j3 = 4 (right), with 7 = d + 5 in both cases, where 5 > is 
a small constant, d is the spatial dimension from (|2.ip . 





a = 


1,(3 = 2 


a = 2, 


/3 = 4 


d 


MC 


MLMC 


MC 


MLMC 


1 


e- 3 


e" 2 


£ -5/2 


e- 2 


2 


e" 4 


e- 2 


e- 3 


e- 2 


3 


e" 5 


e- 3 


£ -7/2 


e" 2 



The reduction in cost associated with the MLMC estimator over standard MC is largely due 
to the fact that the number of samples needed on the finer grids is greatly reduced. Most of the 
uncertainty can already be captured on the coarse grids, and so the MLMC estimator shifts some of 
the computational effort on to the coarse grids. Exactly how much of the computational effort can 
(and should) be shifted towards the coarse grids, depends on the model problem and the quantity 
of interest Q. The MLMC algorithm described above chooses the number of samples on each level 
in such a way that the computational cost of the estimator is minimized, subject to the overall 
variance of the estimator being less than e 2 /2. This can lead to three different scenarios: the 
computational cost could be predominantly on the coarse levels, spread evenly across the levels, or 
predominantly on the fine levels. This corresponds to the three upper bounds given in Theorem 12. II 
above. 

To make this more precise, note that for given {N^} and {Q}, the computational cost of the 
MLMC estimator is 

L 

1=0 

Treating the Ni as continuous variables, the cost of the MLMC estimator is minimized for a fixed 
variance by choosing 

with the constant of proportionality chosen so that the overall variance is e 2 /2. The total cost on 
level i is then proportional to yV[$^] Q, and hence 

If the variance V[Y^] decays faster with £ than the cost Ci increases, i.e. if (3 > 7, the dominant term 
will be on level 0. Similarly, if V[Yg] decays slower than Ce increases, the dominant term will be on 
the finest level L, and if V[Y^] decreases at the same rate as Ci increases, the cost is spread evenly 



5 



across all levels. In the context of our model problem in subsurface flow, we are usually in the last 
regime, where (3 < 7. Especially in 2 or 3 space dimensions, the cost of obtaining one sample grows 
very rapidly, and the dominant cost will always be on the finest level. It is worth to note that if 
f3 = 2a, and like here, j3 < 7, the cost of the MLMC estimator is of the order e _7//o . This is in 
fact the same cost as taking only one sample on the finest grid, since we have to choose h tz e a to 
get a MSE of 0(e 2 ), and the cost of one solve is then C < /i~ 7 = £~ 7//q , by assumption M3. This 
means that asymptotically our multilevel Monte Carlo method for the stochastic problem has the 
same complexity as a deterministic solver for one realization of the same problem. 

Another issue which influences the cost of the MLMC estimator, is the choice of the coarsest 
mesh size ho- The bigger ho is, the more levels we can include in the MLMC estimator, and the 
bigger the potential gains are with respect to standard MC. Although the choice of ho does not 
influence the asymptotic bounds on the cost given in Theorem I2.lt the choice of ho does have 
an effect on the absolute cost of the MLMC estimator for any fixed accuracy e. In practical 
applications, ho must often be chosen to give a minimal level of resolution to the problem in order 
to get the MLMC estimator with the smallest absolute cost. For the model problem in subsurface 
flow, where the permeability varies on a very fine scale and is highly oscillatory, very coarse meshes 
do not yield a good representation of the problem, and including them in the MLMC estimator 
can lead to a larger absolute cost than necessary. One way to circumvent this problem, is to use 
smoother representations of the permeability on the coarse levels. It was shown in [20] that, without 
introducing any additional bias in the MLMC estimator, this strategy allows for the inclusion of 
much coarser levels, and hence gives a significantly lower absolute cost of the MLMC estimator, 
even in the context of short correlation lengths. 

The rest of the paper is devoted to proving theoretical convergence rates, and thus justifying 
assumptions Ml and M2 in Theorem 12.11 

3 Numerical Analysis 

For simplicity, we consider a particular instance of model problem (12.ip . posed on a Lipschitz- 
polygonal domain Del 2 and with homogeneous Dirichlet conditions: Given a probability space 
(Q, A, P) and uj € O, find «:!lxC->l such that 

—div(A(co,x)Vu(uj,x)) = f(uj,x), for x E D, (3.1) 
u(ou, x) = 0, for x € Tj . 

The differential operators div and V are with respect to x € D, and T := UjLiTj denotes the 
boundary of D, partitioned into straight line segments. Note that due to the tensor-valued coeffi- 
cient A(uj,x), this problem is more general than those studied in our earlier papers [5] and |20| . It 
is of course possible to consider other boundary conditions and/or higher spatial dimensions, and 
this is done for example in |20] , One can also include lower order terms in the differential operator. 

We will carry out a finite element error analysis of (13. ip . under minimal assumptions on the 
coefficient tensor A and on the source term /. In particular, we do not assume that A is coercive 
and bounded uniformly in u, since this is not the case for example for log-normal random fields, 
which can take values arbitrarily close to zero or infinity for any given realization. The crucial 
observation is that for each fixed u, we have a uniformly coercive and bounded problem (in x). The 
first step in our error analysis is therefore to derive an estimate on the finite element error for a 
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fixed uj. However, in order to be able to compute moments (expectations) of the error, it is crucial 
that we keep track of how all the constants that appear in our estimates depend on uj, or in other 
words on A and /. 

The coefficient tensor A(w, •) is assumed to take values in the space of real-valued, symmetric 

1 /2 

d x d matrices. Given the usual norm \v\ := (j>2i = i \ v i l) on we choose the norm on M dxd as 
the norm induced by | • |, or any matrix norm equivalent to it. 
For all uj £ £1, let now A m i n (u;) be such that 

A(w,x)£ • f > A min H|£| 2 , V£ G R d , uniformly in x G D, 

and define 

A max (u;) := ||A(w, •)llc(S,K dxd ) - 



We make the following assumptions on the input data: 
Al. A m i n > almost surely and 1/A m ; n G LP(Q,), for all p G (0, oo). 

A2. Aij G C*(I5),i,j = l,...,d, and A G L p (0,C"(D, M dxrf ))), for some < t < 1 and for all 
p G (0,oo). 

A3. / G i7* (ft, H*- 1 (£>)), for some p* G (0,oo]. 

Here, the space C t (D, M. dxd ) is the space of <i x d matrix-valued, Holder-continuous functions 
with exponent t, H S (D) is the usual fractional order Sobolev space, and L q (fl,B) denotes the 
space of £>-valued random fields, for which the g th moment (with respect to the measure P) of the 
i3-norm is finite, see e.g [5]. A space which will appear in the error analysis later is the space 
L q (Q, Hq(D)), which denotes the space of Hq (D)-valued random fields with the norm on Hq(D) 
being the usual ^(D) -seminorm | • \ijim)- It is possible to weaken Assumptions Al and A2 to 
1/A m i n and ||A|| ct ^ K dxd) having finite moments of order p a , for some p a G (0, oo), but we will 
not do this here for ease of presentation. 

An example of a random tensor A(w, x) that satisfies Assumptions Al and A2, for all p G (0, oo), 
is a tensor of the form A = exp(gi)Ki + exp (52)^2, where g\ and 52 are real- valued Gaussian 
random fields with a Holder-continuous mean and a Lipschitz continuous covariance function, and 
K\ and K2 are deterministic tensors satisfying (deterministic versions of) assumptions A1-A2. For 
example, gi,i = 1,2, could have constant mean and an exponential covariance function, given by 



E 



{gi{uj,x) -M[gi(aj,x)])(gi(uj,y) - E[&(w, y)]) = a 2 exp(-||x - y\\/X) 



where a 2 and A are real parameters known as the variance and correlation length, and || • || denotes a 
norm on M. d . It follows from the results in [S] that the resulting random tensor satisfies assumptions 
A1-A2, for any t < 1/2. If we instead choose a smoother covariance function, like the Gaussian 
covariance 



E 



(gi(uj,x) - E[gi(uj,x)})(gi(u,y) - E[gi(u),y)]) = cr 2 exp(-||x - y|| 2 /A 



the resulting random tensor A satisfies assumptions A1-A2 with t = 1. 
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To simplify the notation in the following, let < Cxj < 00 denote a generic constant which 
depends algebraically on L 9 (r2)-norms of A max , 1/A min , ||A|| ct ^ RdX d) and H/Hift-ip), witn Q < 
p* in the case of ||/||jft-i(£))- We will also use the notation b < c for two positive quantities b and 
c, if b/c is uniformly bounded by a constant independent of A, / and h. 

We will study the PDE (|3.ip in weak (or variational) form, for fixed u G 0. This is not possible 
uniformly in f2, but almost surely. In the following we will not explicitly write this each time. With 
f(u, •) G H t ~ 1 (D) and < A m i n (w) < A max (u;) < 00, for all x G D, the variational formulation of 
(|3.ip . parametrized by a; G O, is 

6 w (ti(w, ■),«)= L u (v) , for all i> G AqP), (3-2) 

where the bilinear form b u and the linear functional L w (both parametrized by oj G SI) are defined 
as usual, for all u,v £ Hq(D), by 

& w (u,v):=y A(o;,x)Vn(x) • Vt;(x)dx and L w (u) := (/(w, •), v )H t - 1 (D),H^- t (D) ■ 

We say that for any cj G f2, u(w, •) is a weak solution of (|3.ip iff n(o;, •) G Hq(D) and satisfies (|3.2j) . 
The following result is classical. It is based on the Lax-Milgram Lemma (cf |16j). 



Lemma 3.1. For almost all uj G Q, the bilinear form b w {u,v) is bounded and coercive in Hq(D) 
with respect to \ ■ \h 1 (d)> with constants A max (o;) and A min (uj), respectively. Moreover, there exists 
a unique solution u(cj, •) G Hq(D) to the variational problem (|3.2p and 

If \ ^ \\f(vr)\\H-HD) 

|n(w '-^ 1(D) - A" h • 

We now consider finite element approximations of our model problem (|3,ip using standard, 
continuous, piecewise linear finite elements. This is not the only possible choice, and the MLMC 
estimator works equally well with other spatial discretizations. See for example [7j for results with 
finite volume discretizations, and [T3] for an error analysis in the case of mixed finite elements. 

Denote by {Th}h>o a shape-regular family of simplicial triangulations of the domain D, parametrized 
by its mesh width h := m&x Te -r h diam(r). 

Associated with each triangulation Th we define the space 

Vh '■= {vh G C(D) : Vh\ T linear, for all r G Th, and Vh\r = 0} 

of continuous, piecewise linear functions on D that vanish on the boundary. 

The finite element approximation of u in Vh, denoted by Uh, is now found by solving 

b u [u h (uj, -),v) = L u (v) , for all v G Vh, 

The key tools in proving convergence of the finite element method are Cea's lemma and a best 
approximation result (cf [T6| 13]): 

Lemma 3.2 (Cea's Lemma). Let Assumptions A1-A3 hold. Then, for almost all 

(A (^) \ ^ 
A maX / \ j inf \u(u,-) -v h \ H i(D)- 
A m i n (w) / v h ev h 
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Lemma 3.3. Let v G H 1+S (D) n H^(D), for some < s < 1. T/ien 

where the hidden constant is independent of v and h. 

In order to conclude on the the convergence of u to Uh in the L P (Q, Hq (D))-norm, or in other 
words on the convergence of moments of the if 1 (D)-seminorm of the error, it is hence crucial that 
we can bound moments of IMIifi+sm), for some < s < 1. The spatial regularity of u depends 
both on the regularity of A and /, and on the geometry of the domain D, so we need the following 
definition in addition to assumptions A1-A3. 

Definition 3.4. Let < Xa(D) < 1 be such that for any < s < Xa{D),s ^ |, the Laplace 
operator A is surjective as an operator from H 1+S (D) n Hq(D) to H s ~ l (D). In other words, 
let Aa(-D) be no larger than the order of the strongest singularity of the Laplace operator with 
homogeneous Dirichlet boundary conditions on D. 

In general, the value of Xa(D) depends on the geometry of D, and the type of boundary 
conditions imposed. For the particular model problem (13. ip . we have that Xa(D) = 1 for convex 
domains. For non-convex domains, we have Aa(-D) = mm'JL 1 TT/6j, where 9j is the angle at corner 
Sj, and m is the number of corners in D. Hence, Aa(-D) > 1/2 for any Lipschitz polygonal domain. 

In the particular case of scalar coefficients, the following result was proven in [20] . 



Theorem 3.5. Suppose A = aid, for some a : x D — » R, and let Assumptions A1-A3 hold for 
some < t < 1. Then, 

„ , M| A max (a;)||A(,;,-)||^ dx 

Hw,-)||ffi+«(U) < T 7-T4 \\f\\H*-HD), 

for almost all to € Vt and for all < s < t such that s < Xa(D). Hence, 

\\ u ~ u h\\LP(n,H^D)) ^ C A,fh s , forallp<p*, 

with Caj < 00 a constant that depends on the input data, but is independent of h. If A1-A3 hold 
with t = Xa{D) = I, then \\u - u h \\ LP ^ H i {D)) < C A jh. 

From Theoreix l3.51 one can easily deduce convergence rates a and {3 for Theorem 1, for Q = 
\u\ H i( D )- Assume > 2, and < t < 1. Using the reverse triangle inequality, we get 

E [| Hm{D) ~ \uht \m(D)\] < E [\u - u hl \ H i {D) ] = \\u- u he \\ L i { n :H i {D)) < C A j hf, (3.3) 

and so a = s. Similarly, using Y(X) < E [^ 2 ] , the reverse triangle inequality, the triangle inequality 
and E [X 2 ] = \\X\\ 2 L2 ^ n y we have 

v [\\ u h e \m(D) ~ \v>h t -i\w{D)\] < E {\ u h e ~ u hi ^\ H i {D) ) 2 <C A jhj s , (3.4) 

and so f3 = 2s. If t = 1, one can similarly show that assumptions M1-M2 are satisfied with a = 1 
and = 2. 



9 



As in the deterministic setting, one can use Theorerr i3.51 together with a duality argument, 
to prove convergence of the finite element error for other quantities of interest. These quantities 
include ||"u||l 2 (d); f° r which one can prove convergence rates twice those of the i7 1 (D)-seminorm 
(see [5]), and all functionals which are continuously Frechet differentiable, for which one can prove 
convergence rates up to twice those of the // 1 (Z))-seminorm, depending on the functional (see |20j). 

The remainder of this section will be devoted to extending the theory above. In §3.14 we prove 
an analogue of Theoreir l3.5l in the case of more general tensor coefficients A. In §3.21 we prove 
convergence of a functional that does not fit into the framework of functionals covered in [20J, 
namely point evaluations. 

3.1 Regularity of the Solution 

The main result in this section is that Theorerr i3.5l holds also for more general tensor coefficients. 
Theorem 3.6. Let Assumptions A1-A3 hold for some < t < 1. Then, 

„ , mi . A — HIIAKOII^^^x.) 
\\u(u,-)\\ H i+s {D) < — \\f\\Ht-i(D), 

for almost all uj G Q, and for all < s < t such that s < Xa(D). Hence, u G L P (Q, H l+S (D)), 
for any p < p*. If A 1- A3 hold with t = Aa(-D) = 1, then the above bound holds with s = 1, and 
u G LP(n,H 2 (D)). 

The proof of Theorer d3.6l is very similar to the proof in the case of scalar coefficients, which can 
be found in full in |20} §5] and j5j §A]. We will therefore only give the final result, together with 
the main ideas of the proof, in this section. For a detailed proof, see |19j . 

Proof of Theorer \3. 6[ (Sketch) The proof follows closely that of fSDJ §5.1]. We denote by A w the 
differential operator — div(AV-). From a result in perturbation theory, it suffices to show that there 
exists a constant C scini ((jj) such that 

IMIiri+.(D) < C aem ^)\\A u v\\ H s-i (D) , for all v G H 1+S (D) n £#(£>), (3.5) 

in order to conclude that u(u, •) G H 1+S {D). To prove the existence of such a constant C scmi (w), 
we combine regularity results for operators with constant coefficients in polygonal domains, with 
regularity results for operators with variable coefficients in smooth domains. 

We first choose a smooth (C 2 ) domain D 1 C D, which roughly speaking coincides with D away 
from the corners, and does not contain any of the corners. A slight generalization of the proof in 
[5j §A], establishes the required result (13. 5p for all functions w G H 1+S (D') n Hq(D'). 

Secondly, in order to characterize the behavior of the function v near the corners, we let W be 
a polygonal sub domain of D, which includes some corner Sj. To prove (13. 5ft in W, we first show 
that 

A min (w) \\w\\ H i+s {w) < \\Alw\\ H s-i {w) , for all w G H l+s {W) n H^(W), 

where Aij is the operator A^, with coefficients frozen at the corner Sj. This is done by using the 
Cauchy-Schwartz and the Poincare inequalities, together with the definition of A m ; n (u;). 
Using the triangle inequality, we then have 

A min (w) |H|#i+ s(W r) < + \\A W W - Alw\\ H s-i(w)) , 
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and so the crucial step is now to bound — ■AI) w \\h s - 1 (W)' This is done by showing that 

this difference can be bounded in terms of ||A — A(Sj)\\ cl y^ Rdxd) and ||A|| ct /pp Rdxd), which, by 
our regularity assumption A2, can be made arbitrarily small by making W arbitrarily small. This 
establishes (J33]| for functions w € H 1+S (W) n H£(W). 

The final estimate (|3.5p . can then be deduced by combining the two results with the help of 
suitable cut-off functions. The final result is that (13.51) holds with 



c = Amax(^)||A(^,-)||^, t( ^ Rdxd) 

A m i n (u;)4 



□ 



3.2 Convergence of Point Evaluations 



The aim of this section is to derive bounds on moments of \\{u — Uh)(u,')\\L°°(D) an d ||(« — 
Uh)(uj, •)\\w 1 '°°{D)- This will give us convergence rates of the finite element error for point eval- 
uations of the pressure u and the Darcy flux -AVu. A classical method used to derive these 
estimates, is the method of weighted Sobolev spaces by Nitsche. The results presented in this sec- 
tion are specific to continuous, linear finite elements on triangles, but extensions to higher spatial 
dimensions and/or higher order elements can be proved in a similar way (see e.g. [6]). 
The main result is the following theorem. A detailed proof can again be found in [19J. 

Theorem 3.7. Assume u G Hq(D) n C r {D), for some < r < 2. Then 

||0 - u h )(u,-)\\ L °°(D) < Amax ^ h r | In ft | \\u(u,-)\\ C rm)- 

If 1 < r < 2, we furthermore have 

\{u-u h ){u,-)\ w i,oo {D) < ^ max fr| h r - 1 | In ft | \\u{u),-)\\ CT(By 

Proof. (Sketch) Using the method of weighted norms by Nitsche, as is done in for example [6, §3.3], 
one can derive the quasi-optimality result 



(u-U h )(u),-)\\ L ao( D ) + h\(u - u h )(uj, -)\wi>°°(D) 



< 



Amax ^| inf -Vh\\L°°(D) + h\\nh\ \u(u, •) - v h \ w i,oo {D) ) , 

which holds for any h sufficiently small, and where again the dependence on A has been made 
explicit. The claim of the proposition then follows from the best approximation result 

inf \\u(u,-) -v h \\ L oo {D) < h r \\u(u),-)\\ C rffi), 

which can be found in e.g. |18j . and holds for all < r < 2. □ 

In order to conclude on the convergence of moments of ||(u— Uh)(ui, •) and \ (u—Uh)(uj, Olw 1 - 1 

it remains to prove a bound on moments of \\u(lo, OllcfS)' ^ or some < r < 2. One way to achieve 
this is to use the Sobolev Embedding Theorem (see e.g. [H §3.1]). We know from Theorem 
that u(u, ■) E H 1+S (D), for some < s < 1, which gives the following convergence rates. 



11 



Theorem 3.8. Let Assumptions A1-A3 be satisfied, for some < t < 1, and let < s < t be such 
that u e L p (n,H 1+s (D)) ; for all p < p*. Then 

\\u-u h \\ LP(n ^ m < C AJ h 1+s - d / 2 , Vs si. ^-Ks<l, 

\W - Uh\\LP(n,w^(D)) ^ C A j h s ~ d l 2 , Vs s.t. - < s < 1, 

/or o// p < p*, with Caj a finite constant dependent on A and /, 5u£ independent of h and u. 

Proof. This follows directly from Theoren i3.7l and the Sobolev Embedding Theorem. □ 

Alternatively, one can use Schauder theory to derive a bound on ||u(o;, •) \\(j r m^ directly, without 
going through the Sobolev Embedding Theorem. Theorems 8.33 and 8.34 in [IT] give the following. 

Theorem 3.9. Let Assumptions A1-A3 be satisfied, for some < t < 1 and p* > d/(l — t), and 
suppose D is a C 1+t domain. Then u(u, •) € C 1+t (D), and 

ll«( w >-)llci+*(D) ^ Cschaudor ( \\u(u, •) || c(2J) + •) || LP . (£))) , (3.6) 

where the constant C sc h a uder depends on A m i n (u;), A max (w) and ||A(a;, Ollc^D R dxd )- 

A similar result can again be proved for polygonal domains D, taking into account the singu- 
larities which can arise near the corners. The regularity of u, or more precisely the number r for 
which u(oj, •) E C r (D), will again depend on t and the angles in D (see e.g. [15, §6]). 

Theorem 13.91 suggests that \\u — u^Wi^mL 00 ^)) an d \\u — u/illjy^w 1 . 00 ^)) should converge 
with h l+t and h l , respectively. These rates are better than those proved in Theorem I3.8( and 
in particular, are dimension independent. To be able to conclude rigorously on these convergence 
rates, we would, as in Theorem 13.61 have to know exactly how the constant C sc h au der depends on 
A m i n (a;), A max (w) and || A(w, -)II(7*(Z) R dxd V Theorem 13.91 does, however, allow us to conclude on 
these higher convergence rates path wise (i.e. for almost all oj 6 f2, as in Theorem 13. 7p . 

The results in this section can be used in the same way as in f)3.3[) and (|3.4p to prove convergence 
rates a and (3 in Theorem [2J] for point evaluations. Using the fact that u(uj, •) £ C 1 (D) (cf Theorem 
13. 9|) for almost all u, we for example have for evaluations of the norm of the Darcy flux — AVn at 
a point x* G D 

E [||AV«(x*)| - \AVu he (x*)\\] < E [|A(z*)| \(Vu - Vu he )(x*)\] < E [A max \u - u he \ wl , x(D) } < C AJ hf, 

where a = s — d/2, if we use Theorem 13.81 or a = t, if we use the rates suggested by Theorem 13.91 
Similarly, we have 

V[||AVu(x*)|-|AV« ft ,(x*)||] <E[A 2 mSLX \u-u he \ 2 wl ^ (D) ] <C AJ h 2a , 

and so j3 = 2a, where a is as above. This can easily be generalized to point evaluations of the Darcy 
flux in a given coordinate direction. The proof for point evaluations of the pressure is also similar, 
and leads to convergence rates a = 1 + s — d/2 and j3 = 2(1 + s — d/2), if we use Theorem 13. 8} and 
a = 1 + t and j3 = 2(1 + t), if we use the rates suggested by Theorem 13.91 
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4 Conclusions 



We have considered the application of multilevel Monte Carlo methods to elliptic PDEs with random 
coefficients, in the important case of coefficients which are not uniformly coercive and bounded with 
respect to the random parameter. This includes, for example, log-normal random fields. Under 
minimal assumptions on the random coefficient, we have proven convergence of the multilevel Monte 
Carlo algorithm, together with an upper bound on its computational cost. We have shown that 
the convergence analysis in [20] holds also in the case of more general, tensor-valued coefficients, 
and also for point evaluations of the pressure and the flux. 
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